y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u j is a function of x 1, x 2,, x p : u j = u j x 1, x 2,, x p ) For convenience, we often set u 0 = 1 Assumptions so far: 1 EY x) or EY u)) =! 0 +! 1 u 1 + +! k-1 u k-1 =! T u Linear Mean Function) 2 VarY x) or VarY u)) = " 2 Constant Variance) Additional Terminology: Similar to simple linear regression) y ˆ i = ˆ " T u i i th fitted value or i th fit) constant function/term) e ˆ i = y i - y ˆ i i th residual) u = # $ u 0 u 1 u k"1 & # = ' $ 1 u 1 u k"1 & ' RSS = RSS ˆ ") = # y i - y ˆ i ) 2 = # e ˆ i ) 2 residual sum of squares)

3 4 Results from Assumptions 1) and 2): Similar to simple linear regression) ˆ " j is an unbiased estimator of " j ˆ " 2 = 1 n " k RSS is an unbiased estimator of " 2 Note: In simple regression, k = 2 Example: Haystacks Additional Assumptions Needed for Inference: 3) Y x is normally distributed Recall that this will be the case if X,Y are multivariate normal) Consequences of Assumptions 1) - 4) for Inference for Coefficients: Y x ~ N! T u, " 2 ) ˆ " 2 is a multiple of a $ 2 random variable with n-k degrees of freedom -- so we say ˆ " 2 and RSS have df = n-k There is a formula for se ˆ " j) We'll use software to calculate it) ˆ " j #" j se ˆ " j ) ~ tn-k) for each j 4) The y i 's are independent observations from the Y x i 's

5 6 Note: The consequences listed above are also valid replacing 3) by the weaker assumption that Y x i is normally distributed for i = 1, 2,, p If the Y x i s are not normal, but are not too illbehaved and n is large enough, the consequences above are still approximately true, thanks to the CLT Example: Haystacks Caution: Multiple Testing Recall: If you set an level for hypothesis tests, then a p-value less than tells you that at least) one of the following holds: i) The model does not fit ii) The null hypothesis is false iii) The sample at hand is one of the less than percent of samples for which you would falsely reject the null hypothesis

7 8 If you are doing two hypothesis tests with the same data: There is no guarantee that the bad samples for which you falsely reject the null) are the same for both tests In general, the probability of falsely rejecting one of the two null hypotheses is greater than When doing two hypothesis tests with the same data, you typically need an overall significance level : That is, you want to be able to say that, if the model fits and both null hypotheses are true, then the probability of falsely rejecting at least one of the two null hypotheses using your decision rule is To do this, you typically need lower significance levels for each test individually One way to be sure of having an overall significance level when doing k hypothesis tests with the same data is the Bonferroni method: Require significance level /k for each test individually There are various other methods that allow individual significance levels higher than /k, but they only apply in specific situations) For this reason, in model-building in regression, p- values for hypothesis tests are often interpreted as just loose guides for what might or might not be reasonable

9 10 A similar situation holds for confidence intervals: Inference for Means: Recall from simple regression: To be able to say, We have produced these two intervals by a procedure which, for 95 of all suitable samples, produces a first interval containing! 0 and a second interval containing! 1 ie, if you want an overall confidence level 95), the two individual confidence intervals need to have individual confidence level greater than 95 So Var ˆ E Y x)) = Var ˆ E Y x) x 1,, x n ) $ = " 2 1 n + x # x & SXX ) 2 )2 se E ˆ Y x) = " ˆ 1 n + x # x SXX ' ) Bonferroni will also work here: requiring individual confidence levels 975 will suffice to give overall confidence level 95 for two confidence intervals In regression, we can also use confidence regions; see Section 108 for more details = " ˆ x a function of x and the x i 's but not the y i 's) An analogous computation best done by matrices -- see Section 79) in the multiple regression model gives Var ˆ E Y x)) = Var ˆ E Y x) x 1,, x n ) = h" 2, where h = hu) = hx) by abuse of notation) is a function of u 1, u 2,, u n, called the leverage

11 12 In simple regression, hx) = 1 n + x " x )2 SXX Note that x " x ) 2 hence also hx)) is a non-linear) measure of the distance from x to x Similarly, in multiple regression, hx) is a type of measure of the distance from u to the centroid u = # $ 1 u 1 u k"1 &, ' ie, it is a monotone function of particular: # u j " u j ) 2 ) In The further u is from u, the larger Var ˆ E Y x)) is, so the less precisely we can estimate EY x) or y For example, an x-outlier could give a large h, and hence make inference less precise Define: Summarize: se ˆ E Y x)) = ˆ " hu) The larger the leverage, the larger se ˆ E Y x)) is, so the less precisely we can estimate EY x) The leverage depends just on the x i 's, not on the y i 's Similarly to simple regression: The sampling distribution of ˆ E Y x) is normal E ˆ Y x) - EY x) se E ˆ Y x) ~ tn-k) Thus we can do hypothesis tests and find confidence intervals for the conditional mean response EY x) Example: 1 predictor

13 14 Again, The consequences listed above are also valid replacing 3) by the weaker assumption that Y x i is normally distributed for i = 1, 2,, p If the Y x i s are not normal, but are not too illbehaved and n is large enough, the consequences above are still approximately true, thanks to the CLT Prediction: Results are similar to simple regression: Prediction error = Y x - ˆ E Y x) VarY x - ˆ E Y x)) = " 2 1 +hu)) Define se Y pred x) = ˆ " 1+ h Y x " E ˆ Y x) sey pred x) ~ tn-k), so we can form prediction intervals Caution: As with simple regression, for prediction, we need the assumption that EY x) is normal or very close to normal, with approximate results) Example: Haystacks